Detecting the socio-economic drivers of confidence in government with eXplainable Artificial Intelligence

The European Quality of Government Index (EQI) measures the perceived level of government quality by European Union citizens, combining surveys on corruption, impartiality and quality of provided services. It is, thus, an index based on individual subjective evaluations. Understanding the most relevant objective factors affecting the EQI outcomes is important for both evaluators and policy makers, especially in view of the fact that perception of government integrity contributes to determine the level of civic engagement. In our research, we employ methods of Artificial Intelligence and complex systems physics to measure the impact on the perceived government quality of multifaceted variables, describing territorial development and citizen well-being, from an economic, social and environmental viewpoint. Our study, focused on a set of regions in European Union at a subnational scale, leads to identifying the territorial and demographic drivers of citizens’ confidence in government institutions. In particular, we find that the 2021 EQI values are significantly related to two indicators: the first one is the difference between female and male labour participation rates, and the second one is a proxy of wealth and welfare such as the average number of rooms per inhabitant. This result corroborates the idea of a central role played by labour gender equity and housing policies in government confidence building. In particular, the relevance of the former indicator in EQI prediction results from a combination of positive conditions such as equal job opportunities, vital labour market, welfare and availability of income sources, while the role of the latter is possibly amplified by the lockdown policies related to the COVID-19 pandemics. The analysis is based on combining regression, to predict EQI from a set of publicly available indicators, with the eXplainable Artificial Intelligence approach, that quantifies the impact of each indicator on the prediction. Such a procedure does not require any ad-hoc hypotheses on the functional dependence of EQI on the indicators used to predict it. Finally, using network science methods concerning community detection, we investigate how the impact of relevant indicators on EQI prediction changes throughout European regions. Thus, the proposed approach enables to identify the objective factors at the basis of government quality perception by citizens in different territorial contexts, providing the methodological basis for the development of a quantitative tool for policy design.


Results
The primary goal of this work is detecting, by using complex network and machine learning methods, the most impactful factors for the prediction of EQI, that quantifies the government quality perceived by European Union citizens for different TL2 regions. In this section, we describe the main findings of our analysis. www.nature.com/scientificreports/ EQI prediction and most influential features. The EQI 2021 index is available for 240 European Union regions, which can be traced back to 197 OECD subregions at Territorial Level 2 (TL2). However, the application of criteria related to the availability of OECD subregional indicators (see "Materials and methods") restricts our analysis to 195 TL2 regions. Such territorial indicators are collected from the OECD Regional Statistics database 5 , and can be classified in four groups: Business and Economy, Demography, Education and Labour, and Territorial and People Well-being. In particular, we choose to work with indicators that are available for at least two thirds of the considered TL2 regions, that are "intensive" (namely, not scaling with the region area or population), and that are not determined by subjective perception. Moreover, we add to the original dataset additional indicators, that we construct by comparing subregional data with the corresponding national values. After undergoing a preprocessing workflow based on indicator availability and non-redundancy criteria, as outlined in detail in "Materials and methods", data are passed to a Boruta feature selection algorithm 6 . Supplementary Data S1-S4 reports a full list of the 612 indicators provided as input to Boruta. Out of them, such a feature selection process identifies 53 that are potentially relevant to predict EQI. Hence, we use the eXtreme Gradient Boosting (XGBoost) regression algorithm, trained in the leave-one-out mode on the 53 selected indicators, to predict the index value in the 195 TL2 regions. In the optimal parameter configuration ( num_parallel_tree = 100 , max_depth = 2 , n_jobs = 100 , see "Materials and methods"), we obtain the results shown in Fig. 2. The effectiveness of regression is quantified by the following figures of merit, with r 2 the coefficient of determination, P the Pearson correlation between predicted and true index (with p the associated p-value), MAE the mean absolute error, and RMSE the root-mean-square error.
We then apply XAI algorithms to determine the most influential variables in the XGBoost regression. For each of the 195 leave-one-out cycles, we collect SHAP values and LIME values, which measure the impact of each indicator on the considered prediction. The results of the SHAP and LIME analyses are shown in Fig. 3, where each line reports a distribution of the values obtained in all cycles for a given variable. The plots, limited to the most influential indicators, highlight the large impact on prediction due to the features PARTICIPATION RATE 15+ GENDER DIFFERENCE (F-M), namely the difference between female and male labour participation rates for citizens over the age of 15, and ROOMS PER CAPITA, namely the average number of rooms per inhabitant. Notice that Supplementary Data S1-S4 contains an explanation of the concise indicator names reported in the text and in the figures.
Competition network of indicators. Competition networks provide a tool to visualize and interpret the results of XAI analysis. Here, we construct two competition networks, starting, respectively, from the SHAP and LIME values associated to the 53 selected indicators for each region.
In a competition network nodes represent indicators, whose tendency to exchange their relative positions in a given set of rankings determines the weight of connections. This means that nodes that consistently place on (1) r 2 = 0.8877,  www.nature.com/scientificreports/ top or bottom of the rankings tend to be characterized by the lowest connectivity (technical details are reported in "Materials and methods"). In our case, the two competition networks are based on 195 rankings determined, respectively, by the lists of absolute SHAP and LIME values associated to each given region. Notice that using the absolute values is necessary, since, as shown in the two panels of Fig. 3, both SHAP and LIME values, even for important features, tend to have vanishing average, since they can affect the outcome of regression either positively or negatively. In Figs. 4, 5, we show an illustration of the competition networks, along with scatter plots in which coordinates are the network degree and the mean absolute SHAP or LIME value, respectively, computed over the 195 regions. Points corresponding to indicators that are relevant in a stable way, for most of the regions, tend to be placed towards the upper left corner of the plot, in which a high mean absolute SHAP or LIME value is accompanied by a small network degree. Here, we find confirmation that both PARTICIPATION RATE 15+ GENDER DIFFERENCE (F-M) and ROOMS PER CAPITA are outstandingly relevant in their capability to predict EQI.
Geographical analysis of SHAP and LIME values with a network of subregions. At this point, we investigate the possible relations between the SHAP and LIME values associated to a given indicator and the general features of the considered regions. To address this issue, we construct a network in which the 195 considered TL2 subregions represent nodes, while edges connecting different regions are meant to highlight their mutual similarity with respect to the set of 53 OECD indicators selected by Boruta. Such a network, whose construction process is detailed in the "Materials and methods" section, consists of 14015 links, whose strength is based on Pearson correlation between the selected indicator sets of the subregions they connect.
We begin the analysis of region networks by investigating the possible relation between the EQI scores of territories and their similarity in terms of the 53 selected OECD indicators. The tendency in a network to connect nodes with similar attribute values is quantified by assortativity (see "Materials and methods" section for definition and characterization). In our case, we find that the region network is characterized by the statistically significant assortativity 0.258 ± 0.006 , associated with a p-value smaller than 10 −9 . Therefore, since connections among TL2 subregions are related in a relevant way to the results achieved in the EQI ranking, it is reasonable to expect that community detection tends to subdivide the network in groups of territories that are mostly homogeneous in their EQI outcomes. This framework allows to investigate the behavior of indicators that mostly impact on EQI prediction, within and across communities, to unveil possible patterns of similarities and differences in their SHAP and LIME values. Community detection in the subregion network provides the following partition: The detailed composition of each community is represented in Fig. 6 and reported in Supplementary Data S5, while principles and intermediate results of the hierarchical procedure leading to the above communities are discussed in "Materials and methods".
The partition in communities represents the tool to analyze the relatedness between community membership and either the SHAP or the LIME values of the top-5 indicators in the respective mean absolute value rankings (see Fig. 3). The indicators involved in the SHAP geographical analysis are the following ones: • % EMPLOYMENT 25-34 WITH L7 EDUCATION, vs nation (employment rate for people in 25-34 year-old age range with master's or equivalent level education-relative difference between subregional and national value); • % LABOUR FORCE WITH SECONDARY EDUCATION (share of labour force with at least secondary education); • % EARLY LEAVERS (rate of early leavers from education and training, in % of the total population aged 18 to 24).
On the other hand, the indicators involved in the LIME geographical analysis are the following ones:  www.nature.com/scientificreports/ Figure 5. Top: competition network of the selected OECD indicators, determined from the importance rankings of their mean absolute LIME values for each subregion considered in the study. Bottom: scatter plot illustrating the relation between the mean absolute LIME value and the degree in the competition network, for indicator; labels are reported only for features whose mean absolute LIME value exceeds 0.06. In both panels, the color of points indicates the number of regions for which the considered indicator is available in the dataset.
In the top panel, the size of each node increases with the mean absolute LIME value of the corresponding indicator. As in the case of SHAP values, the results confirm that both PARTICIPATION RATE 15+ GENDER DIFFERENCE (F-M) and ROOMS PER CAPITA are outstandingly relevant in their capability to predict EQI. The meaning of all the feature labels is explained in the Supplementary Data S1-S4. The network representation in top panel is generated with Gephi 0.9.5 48 . In order to quantify the relatedness between SHAP/LIME values and community membership, we use the resolution ratio R, a quality factor introduced for a similar analysis in Ref. 38 , that increases as the distributions of a given index in different communities are more separated (see "Materials and methods" for definition and characterization). The resolution ratio is much larger than one when the community index distributions have a vanishing or limited overlap with each other, and much smaller than one when the overlap is practically full. In the intermediate case R ≃ 1 the separation between the mean values of neighboring community distributions is comparable to the typical variation of the index within each community. The value R = 1 can thus be assumed as a threshold to distinguish cases ( R > 1 ) in which indicators show similar behaviors in regions belonging to the same community and relevant differences across communities, from cases ( R < 1 ) in which no such territorial pattern can be observed.

Scientific Reports
We determine the resolution ratio R for the SHAP and LIME value distributions of the indicators, listed above, that are ranked as top-5 in terms of SHAP and LIME mean absolute values, respectively. The results reported in Figure 6. Complex network of regional similarities determined from the 53 OECD indicators selected by Boruta; node colors indicate the community membership of the 195 subregions. It is evident that, with the exception of few subregions, communities are very much influenced by geographical proximity. The map is generated with the "Map of Countries" and "GeoLayout" plugins of Gephi 0.9.5 48  www.nature.com/scientificreports/

Discussion
The various stages of our analysis highlight new findings on how territorial variables measuring development and individual well-being are related to the perception of government quality. In general, the study confirms that citizens' evaluations on such an issue are often not directly related to the administrative action, but rather determined by the general objective conditions of a territory and people well-being. The assortativity analysis on the subregion network suggests that regions for which the 53 selected indicators are highly-correlated tend to have comparable EQI values. This confirms the good predictive power of these indicators, independently emerging from XGBoost regression. The analysis of SHAP/LIME values related to such a prediction, corroborated by the indicator competition networks, points out the predominant role played by two indicators, namely PARTICIPATION RATE 15+ GENDER DIFFERENCE (F-M) and ROOMS PER CAPITA. It is also interesting that, as one can observe from Fig. 3, the impact on predictions of indicators related to the comparison with national values is limited (but not irrelevant), suggesting that the discrepancies between local conditions and those of the rest of a country play, on average, a minor role.
The emerging relevance of PARTICIPATION RATE 15+ GENDER DIFFERENCE (F-M) and ROOMS PER CAPITA in determining the perceived quality of government is highly noteworthy. The former indicator represents the difference between the labour participation rates, counting people that are either employed or actively searching for work, related to the female and male populations. The importance of such an indicator can be traced back to the fact that different relevant aspects contribute to its outcome, such as gender equity in society, which has one of its most convincing expressions in equal job opportunities, an open and vital labour market, welfare initiatives supporting maternity, and the availability of a larger number of income sources per household. In fact, despite the rising female involvement in the job market since the postwar period, the reversing education gaps and stricter gender equality legislation recently introduced in several OECD countries, residual gender gaps in labour participation seem to persist. Recent empirical evidence shows that there are two main reasons that might explain this phenomenon: the first one is the prevalent role of women in childcare and housework, along with its impact on work-life balance; the second one is represented by general attitudes toward gender roles in society and gender identity 49 . The combined effect of such strongly related phenomena on EQI through PARTICIPATION RATE 15+ GENDER DIFFERENCE (F-M) may be even amplified. Actually, women's burden in childcare and housework may in turn affect employer evaluations, promotion standards and wages. On the other hand, progressive attitudes toward gender roles are positively associated with more equal gender outcomes in the labour market 50 . Thus, the impact of PARTICIPATION RATE 15+ GENDER DIFFERENCE (F-M) on perceived quality of government can be explained as the outcome of two interrelated effects: the efficiency of labour market policies that support women's participation, such as adequate and affordable childcare services, and the ability of public institutions to hamper the transmission of gender norms and stereotypes across generations.
On the other hand, the relevance of ROOMS PER CAPITA reflects the widely recognized importance of the housing problem in the management of social pressure 51 . First of all, this quantity is an established proxy of living conditions. Eurostat, the statistical office of the European Commission 52 , uses overcrowding, defined as the lack of an adequate number of rooms, as one of the variables to characterize severe housing deprivation, among other factors such as leaking roof, rot in window frames or floor, lack of bathtub or shower unit and indoor flushing toilet for sole use of the household, and too dark dwellings. Scientific literature also shows that overcrowding is the dimension that contributes most to determine housing deprivation 53 . Therefore, increasing the number of rooms per capita may help individuals and households to overcome severe material deprivation; this in turn may act by improving satisfaction with the government in place. For instance, individuals may perceive that government interventions play a key role in helping them to gain freedom from material deprivation, thus increasing Table 1. Resolution ratios R of the SHAP (LIME) distributions with respect to the partition of TL2 subregions in network communities, associated to the top-5 indicators in terms of mean absolute SHAP (LIME) values. Resolution ratios larger than 1, highlighted in boldface, indicate an average tendency of distributions pertaining to different communities to separate from each other. Missing entries correspond to the distributions of indicators that do not rank among top-5 in terms of either SHAP or LIME mean absolute values. www.nature.com/scientificreports/ the quality of government perceived. A further aspect is the relevance of the number of rooms per capita in view of the health and socio-economic consequences of the recent COVID-19 pandemic. Actually, not only house overcrowding may amplify the spread of respiratory infections by preventing individuals from keeping the necessary distance, but it also affects quality of daily life and remote working, in case of lockdown or quarantine. By contrast, it is acknowledged that living in under-occupied dwellings is associated with lower levels of anxiety, and higher life satisfaction 54 . Thus, it can be argued that increasing the number of rooms per capita enhances the health and socio-economic conditions of individuals, which may induce them to express approval towards government lockdown interventions to fight the pandemic, with a consequent positive impact on the EQI score. The picture is completed by the outcome of the geographical analysis, which highlights strength and weaknesses of local government policies throughout Europe. Actually, it is possible to observe from Figs. 7-8 that the values of PARTICIPATION RATE 15+ GENDER DIFFERENCE (F-M) and ROOMS PER CAPITA have a negative influence on the predicted EQI, as quantified by the negative SHAP/LIME values, for communities involving regions in south-eastern Europe, while they positively affect EQI elsewhere. It is noteworthy that the impact of such indicators in Spanish and Portuguese regions is entirely different, in positive, from that in countries such as Italy and Greece, which are frequently associated to them in terms of economic conditions.
A strength of this research work lies in its model-agnostic nature, that does not require ad hoc hypotheses on the functional dependence of the target variable (EQI, in this case) on the indicators used to predict it. The proposed approach enables to identify the objective factors that, in a variety of different territorial contexts, contribute to determine the perception of government quality by citizens, providing the methodological basis for the development of a quantitative tool for policy design. The analysis carried out in this paper can be useful from a policy perspective as it provides information on the variables over which one can operate in order to improve the citizens' opinion on the quality of government. This study demonstrates that job gender equity and housing may be key determinants of EQI. We finally remark that the adopted methodology, which follows the paradigm of Artificial Intelligence for Social Good 55 to investigate the relation between an index of social relevance and context variables, can be applied to different areas of the civic life, such as health, regarding in particular the incidence of specific diseases throughout the population 56,57 , and crime-related issues.

Materials and methods
Experimental design. Our analysis is aimed at investigating the main factors affecting the EQI, which quantifies the quality of government perceived by citizens of European Union regions on a subnational scale. The study, based on machine learning and complex network methods, consists of the following steps, summarized in Fig. 1: • collecting (1) values of territorial indicators characterizing the OECD TL2 European Union regions, combining in a reasonable way data abundance and recentness, (2) values of the EQI for the same regions; • removing irrelevant and redundant information from the set of collected territorial indicators trough a preprocessing pipeline and the Boruta feature selection algorithm; • training a regression algorithm, capable of predicting the EQI values for each of the 195 subregions from the selected TL2 OECD indicators; • implementing the XAI approach to identify through SHAP and LIME values the features that are most important to determine the EQI prediction; • constructing competition networks, in which indicators represent nodes, to compare their impact, quantified by the mean absolute SHAP and LIME values, on the EQI prediction, taking into account the different data availability across regions; • analysing the geographical variability in the relation between territorial indicators and EQI values by means of a complex network with nodes coinciding with the 195 considered TL2 subregions and connections among them based on similarity in terms of the selected OECD indicators; verifying the assortative behavior of such a subregion network with respect to the EQI values; • using the unsupervised partition of the subregion network, obtained by a community detection algorithm, to divide regions in homogeneous groups in terms of territory indicators, in order to find relations between SHAP/LIME values and community membership.
In the following, we detail the implementation of the aforementioned process.
Data collection and preprocessing. The EQI ranking for the year 2021 is retrieved from the dedicated website 1 . Details on the computation of this index for the considered year are reported in the working paper 58 by the Quality of Government (QoG) Institute of University of Gothenburg. The EQI 2021 is available for 240 European Union regions, mostly corresponding to the OECD subregions at Territorial Level 2 (TL2). Exceptions are represented by Germany and Belgium, for which the EQI is reported with a finer geographical resolution, where some of the TL2 subregions are split in smaller parts. In these cases, we associate to a given TL2 subregion a weighted average of EQI values reported for its parts, with a weight proportional to population (details on the merged regions are provided in the Supplementary Information). In such a way, we associate an EQI value to 197 OECD subregions at TL2. Territorial indicators are collected from the OECD Regional Statistics database 5 . We perform a preliminary availability check, in which we select only those indicators containing data for at least two thirds of the 197 TL2 regions. This process yields 1056 indicators, that we subdivide in four groups: 594 indicators in the Business and Economy (BE) group, 191 in Demography (D), 245 in Education and Labour (EL), and 26 in Territorial and People Well-being (TPW). After noticing that data are entirely missing in all groups for region CY00 (Cyprus), www.nature.com/scientificreports/ and in the TPW group for FRY (French overseas), we exclude these two regions from the analysis, which hereafter involves 195 regions. Inside each of the aforementioned categories, we select the "intensive" indicators, namely the ones that do not directly depend on either the territorial extension or the population of a territory. This selection process leaves 292 indicators in the BE group, 58 in D, 183 in EL, and 17 in TPW. The remaining quantities are then subjected to a further selection inside each category, which eliminates those correlated to another indicator by a Pearson correlation larger than 0.9, which are assumed to carry redundant information. In cases of statistical redundancy, we retain the indicator with the wider data availability. This selection leaves 189 indicators in BE, 24 in D, 85 in EL, and 17 in TPW group.
At this point, we enrich the dataset with the "vs nation" indicators, representing the comparison of local values with the corresponding national counterparts. Specifically, each one of them is defined as the difference between the value associated to a given indicator I and TL2 subregion s belonging to country C, and the national value I(C) of the same indicator, normalized to the latter. In principle, this operation could be done for all the indicators remaining in the dataset. However, we have to discard cases in which the national value is vanishing or unavailable, eventually obtaining 181 additional "vs nation" indicators in the BE group, 22 in D, 83 in EL, and 17 in TPW.
We conclude indicator preprocessing by observing that the TPW dataset contains three quantities that are determined by perception rather than objective data, namely SELF-EVALUATION OF LIFE SATISFACTION, PERCEPTION OF CORRUPTION, PERCEIVED SOCIAL NETWORK SUPPORT. In order to minimize the conceptual overlap between features and the EQI index and avoid data leakage issues, we discard these three indicators, along with the corresponding ones compared with the national values.
Feature selection. The data processing described in the previous subsection leads to an overall set of 612 indicators, whose lists, one for each category (BE, D, EL, TPW), are reported in Supplementary Data S1-S4. Before training the regression algorithms, we discard irrelevant features by applying the Boruta framework. Boruta is a feature selection wrapper method 6 that enables to reduce at the same time noise and data redundancy, by selecting a set of uncorrelated features that improves in a statistically significant way the performance of a given machine learning algorithm. In our case, we make the standard choice of basing Boruta on a supervised learning Random Forest (RF) algorithm, which makes the results of feature selection independent of those obtained by XAI, where a different procedure is used. Boruta associates to the original feature set an equal number of shadow features, obtained by randomly shuffling the original set of indicator values. Such an extended dataset is then used to train a RF algorithm, which evaluates the importance of all features, including original and shadow ones, in predicting a given quantity. In this way, the most important features are identified, after independent random shuffling runs, as the ones that are more relevant, in a statistically significant way, than their shadow counterparts. This implies that the Boruta feature selection process is independent of any arbitrary importance threshold.
Regardless of the machine learning algorithm on which it is based, Boruta is not able to direclty handle unavailable feature values. Therefore, only in the context of Boruta feature selection, we choose to fill all the voids of a given indicator with the average of the available values. Boruta is run with estimator = estimator_forest, n_estimators = 'auto' , max_iter = 100 ; the RF algorithm is run with parameters n_jobs = −1 , max_depth = 5 ; the remaining parameters are set to default values. Out of 612 features, Boruta identifies 53 relevant ones, which are forwarded to XGBoost regression.
XGBoost algorithm. The XGBoost algorithm, used in the regression task to predict EQI from the 53 indicators selected by Boruta, was chosen in order to automatically manage unavailable values, without artificial filling. This machine learning approach is based on an ensemble of decision trees, weak prediction models that are iteratively trained through a gradient boosting pipeline, in which new decision trees specialize in addressing the critical points of the previous ones. In this process, the XGBoost algorithm is able to tackle the problem of missing values by sparsity-aware split finding, which exploits the data sparsity pattern in a unified way, learning the best direction to take when the feature needed for the split is missing 59 . We determined the best performance of regression in the leave-one-out mode by exploring combinations of the following XGBoost parameters: • num_parallel_tree ∈ {50, 100, 250}, • max_depth ∈ {2, 5, 7}, • n_jobs ∈ {1, 5, 10, 100}, with importance_type set to "gain" mode and squared error as the default objective function. The optimal configuration is found for num_parallel_tree= 100 , max_depth= 2 , independently of n_jobs. The regression outcomes corresponding to these values are reported in the "Results" section, and illustrated in Fig. 2. The variability with respect to the best performance quantifiers in the explored parameter space is small, with standard deviations amounting to 1.7% of the optimal r 2 , 0.9% of the optimal P, 4.1% of the optimal MAE, and 6.4% of the optimal RMSE; notice that the results do not depend on the n_jobs parameter. www.nature.com/scientificreports/ Explainable Artificial Intelligence. We describe here the principles and the operation of the XAI algorithms employed in this research, namely SHAP and LIME. SHAP (SHapley Additive exPlanation) is an algorithm based on the concept of the Shapley values derived from the cooperative game theory 60,61 . It is a local model-agnostic post-hoc explainer, hence it can learn local interpretable linear models for the samples, ignoring the specific regressors (or classifiers) to explore the contribution of each feature on the prediction of each sample. The Shapley value for a feature j is assessed as the difference between the prediction of the model output by including that feature and the prediction of the model without that feature, considering all possible feature subsets with and without j. In order to assess the Shapley values, the model should be retrained on all feature subsets F, included in the total set S of features ( F ⊆ S ). For a generic model f, if f x (F) denotes the prediction for the instance x given the subset F without the j-th feature, and f x (F ∪ j) denotes the prediction resulting by adding the j-th feature to the subset F, the marginal contribution of adding the j-th feature value to F can be computed as the difference f x (F ∪ j) − f x (F) . The Shapley value, measuring the impact of the j-th feature on the prediction for x, is then assessed by considering all possible subsets of feature values: where |F|! represents the number of permutations of features positioned before the j-th feature, (|S| − |F| − 1)! represents the number of permutations of feature values that appear after the j-th feature value and |S|! is the total number of permutations 60 . LIME (Local Interpretable Model-agnostic Explanations) represents a different model-agnostic explanation method, whose goal is to associate to a regressor (or a classifier) f an interpretable model g, found through an optimization process 62 . The input of g is represented by the simplified input features z ′ ∈ Z , namely binary vectors in which a component is 1 if the corresponding feature is made available to f, and 0 otherwise. The procedure is based on finding, for a given input instance x, the parameters of g that minimize the objective function In the above equation, π x (z ′ ) is a weight function which adds a higher penalty for vectors z ′ that are farther from the simplified feature vector x ′ , whose components are either 1 or 0 according to whether the corresponding feature is available or not in the input x. The function �(g) , instead, quantifies the complexity of the model (e.g., the number of non-zero weights in a linear model, or the tree depth in decision trees), that is usually fixed a priori. If g is a linear model, the weights of features in the prediction for input x are identified as the corresponding linear coefficients. In this work, we employed a linear model through the function LimeTabularExplainer of the "lime" package for Python 63 , setting the regression mode and using default settings for the other parameters; the function uses, in default mode, an exponential weight in the Euclidean distance between vectors x ′ and z ′ , whose width is set to 0.75 times the square root of the total number of features.
Indicator competition network construction. The indicator competition network is based on comparing their rankings in the 195 SHAP or LIME lists associated to the EQI prediction for each considered region. For definiteness, we focus on the case of SHAP, the procedure followed for LIME being identical. For each pair of indicators i and j, corresponding to nodes in the network, we can associate the values SHAP i (α) and SHAP j (α) , with α ∈ {1, 2, . . . , 195} labelling the region. Then, we construct a 195-element vector V ij , whose α-th entry is 0 if SHAP i (α) > SHAP j (α) , and 1 in the complementary case. The variety of entries in V ij is quantified by its Shannon entropy, which constitutes the weight of the edge connecting nodes i and j. Therefore, if i is more relevant than j in the EQI prediction for all regions, then V ij will contain only zeros, and its entropy will be vanishing. The same occurs in the opposite case, in which j is always more relevant. In these two extreme cases, there is no link between i and j (i.e., they do not compete with each other). Such a network construction process provides 1357 links for SHAP and 1265 links for LIME.

Region network construction.
Using the 53 indicators selected by Boruta, a region network is constructed, whose nodes correspond to the 195 considered TL2 regions. In principle, in this network, each pair of regions can be connected, with edge weight provided by the Pearson correlation between the two vectors containing their territorial indicators. However, we choose to discard a given edge in case the null hypothesis of uncorrelated sets of indicators cannot be rejected at a significance level of 5% (i.e., when p > 0.05 , according to the test based on comparing the sample Pearson correlation with the exact distribution of the correlation values between two random vectors, independent and normally distributed). Therefore, the region network is not complete, as reported in the "Results" section, and edges can have both positive and negative weights. Notice that each indicator distribution is normalized in a way that 0 and 1 correspond to the minimum and maximum values, respectively.
Assortativity. Since we generally deal with weighted network, we consider the generalized definition of assortativity with respect to an attribute X 45 , which takes the value x i in correspondence of the i-th node: www.nature.com/scientificreports/ In the above definition, w ij is the weight of the edge (i, j), s i = j w ij is the strength of node i, and W = ij w ij . The expression of r w remains meaningful only in the case of positive weights. Therefore, before evaluating assortativity, we associate to the considered network, which can contain negative-weight edges, an auxiliary subnetwork where only positive-weight edges are retained. The values of assortativity range from −1 (maximally antiassortative network) to +1 (maximally assortative). A network consisting of two same-size and fully connected components, with no connection between them and characterized by two different values of an attribute, is maximally assortative. A network consisting of two subsets with the same cardinality, each characterized by a specific attribute value, with no internal connection but with each node only connected to nodes of the other subset, is maximally antiassortative.
The definition (8) is formally equivalent to the weighted Pearson correlation between two vectors of length 2m, with m the number of edges, whose entries coincide respectively with the attributes x i and x j of the nodes at the ends of each edge (i, j); the contribution of a given pair (x i , x j ) to the overall correlation is determined by the weight w ij of the corresponding edge. If r W = 0 , there is no relevant linear correlation between the attribute vectors of nodes connected by edges. The interpretation of assortativity in terms of a weighted Pearson correlation allows to associate to r w the standard error that is evaluated, along with the related p-value, based on the Student t-distribution hypothesis 64 . The assortativity and the associated standard errors and p-values are computed through an algorithm implemented in the "weights" R library 65 .
Community detection. Community detection is performed using the Spin Glass algorithm 66,67 , with the implementation mode that enables to consider links with negative weights. The resolution γ is treated as a free parameter, varied in the interval [0.8, 1] with step width 0.05. The other parameters of the algorithm are fixed to default values: the upper limit for the number of communities is equal to the number of network nodes (195 in the region networks); the argument, that specifies the balance between the importance of present and missing negatively-weighted edges within a community, is set to 0.01; the null model is a random graph with the same vertex degrees as the input graph; the cooling factor for the simulated annealing process is set to 0.5.
For a given set of parameters, we perform K = 100 runs of the algorithm, each one with a different seed of the pseudorandom number generator. The partition in communities is then chosen by majority voting. However, in order to evaluate the stability of the detected partition in communities, we do not just count the frequency of the most frequently recurring partition. Instead, we introduce a new stability criterion, that takes into account the similarity between different partitions {p j } (j=1,..,K) , based on the average Normalized Mutual Information where NMI(p a , p b ) is the Normalized Mutual Information between a given pair of partitions, and K(K − 1)/2 is the number of distinct pairs. The majority partition over K = 100 runs is approved under the condition �NMI� ≥ 0.90 , related to the general stability of the community detection. Moreover, the majority partition must satisfy the following further requirements: 1. it must not be trivial (i.e., consisting of a single community, coinciding with the whole network); 2. it must not be too fragmented, containing communities whose cardinality is less than 5% of the cardinality of the whole network (i.e., less than 10 nodes).
If the results obtained for 100 runs, at different values of the resolution γ , satisfy the above conditions, we choose the output with larger NMI , and the majority partition corresponding to this choice is identified as the result of community detection. Community detection is performed following a hierarchical procedure: at the first step, the entire network is partitioned according to the above method. Then a second community detection algorithm is performed on each of the first-level communities, and so on to the next levels. A branch of network partitions interrupts if at least one of conditions 1 and 2 is violated; this does not affect the progress of hierarchical partition for other branches.
Considering that the hierarchical community detection on the subregion network does not find partitions beyond the second step, we label each community with a capital "C", followed by a number and a lower-case letter; communities labelled with the same number derive from the same community found at the first hierarchical level. For example, communities C1a and C1b are obtained at the second hierarchical level by subdividing the same community "1" found at the first hierarchical level. Communities C2a and C3a, instead, do not derive from the same first-level community, and the coincidence of their lower-case letters is accidental.
Resolution ratio. The resolution ratio R can be used to quantitatively characterize the separation between ranked index distributions pertaining to different communities. The quantity is defined in terms of mean and variance of the community distributions and of the overall distribution. The definition of the resolution ratio R is based on the relation between the values {x i } assigned to each element i = 1, . . . , N of a given set and the partition of that set in K disjoint groups of cardinality n c , with c = 1, . . . , K . Given the values and the partition, one can evaluate the overall mean µ and variance σ 2 of the whole set {x i } , as well as the mean µ c and variance σ 2 c for each group c. The overall variance σ 2 can be separated in two positive contributions Since σ 2 int coincides with a weighted average of group variances, while σ 2 ext represents the contribution determined by the discrepancy between the group means and the overall mean, a good indicator of separation of group distributions is given by R = σ 2 ext /σ 2 int .

Data availability
The data that support the findings of this study are either publicly available on databases cited in the bibliography, or reported in Supplementary Data files. The source code and the data required to run it are available at the following URL: https:// doi. org/ 10. 5281/ zenodo. 75042 74.